End-to-end credit risk system on 30,000 records. Champion/challenger architecture: WOE scorecard vs XGBoost — both tracked via MLflow. Monte Carlo simulation (500k scenarios) for VaR & CVaR. Three-scenario stress testing (Base / Mild / Severe). Fully deployed FastAPI backend + interactive HTML dashboard. IFRS 9 and Basel III methodologies applied.
The Credit Risk Modelling Platform is an end-to-end credit risk analytics system built on the
UCI Taiwan Credit Card Default dataset (30,000 records). Designed to meet IFRS 9 Expected Credit Loss (ECL)
and Basel III stress-testing requirements, it implements a
champion/challenger model architecture — a WOE-based Logistic Regression scorecard
as the champion and an XGBoost classifier as the challenger — enabling real-time comparison of
interpretable and ensemble-based risk signals.
The platform exposes a FastAPI backend with a business-friendly input layer (10 plain-language
fields abstracting 21 raw features) and a fully interactive HTML/CSS/JavaScript frontend
covering prediction, portfolio analytics, Monte Carlo simulation, stress testing, and sensitivity analysis.
Experiment tracking is handled by MLflow and the training pipeline is managed via
DVC with a params.yaml configuration file.
Problem Statement
Financial institutions face significant uncertainty in credit decision-making. Without a structured,
data-driven approach, lenders rely on subjective judgment — leading to inconsistent approvals,
under-provisioned capital buffers, and regulatory non-compliance. Key challenges include:
No consistent methodology for evaluating borrower default risk across a portfolio.
Inability to quantify portfolio-level exposure — lenders cannot see total expected losses
or how losses concentrate in tail scenarios.
Poor stress resilience visibility — banks cannot answer "what happens to our losses if
the economy contracts by 30%?" without manual, one-off analyses.
Regulatory pressure — IFRS 9 mandates ECL reporting; Basel III requires stress-tested
capital adequacy — both are difficult to produce without an automated system.
Model opacity — black-box models cannot explain a credit decision to a borrower or
regulator, creating legal and reputational risk.
This platform solves all five problems: it produces consistent, explainable credit scores; quantifies ECL
across the full portfolio; stress-tests losses under adverse scenarios; and maintains a challenger model
for ongoing performance benchmarking — all through a single unified system.
Key Insights
Champion/Challenger architecture allows continuous model benchmarking — the scorecard
provides regulatory-friendly explainability while XGBoost maximises predictive accuracy.
WOE transformation enforces monotonic risk relationships and produces an interpretable
credit score in the 576–906 range, with score bands mapping directly to lending decisions.
Business input layer abstracts 21 raw dataset columns into 10 plain-language fields —
non-technical users never interact with raw model features.
Monte Carlo simulation (up to 500,000 scenarios) quantifies tail risk metrics — Value
at Risk (VaR) and Conditional VaR (CVaR) — that deterministic ECL alone cannot capture.
Three-scenario stress testing (Base, Mild +30% PD, Severe +80% PD) lets risk managers
evaluate capital adequacy before adverse conditions materialise.
Sensitivity analysis reveals that LGD has an elasticity of ~2.2× relative to PD,
meaning recovery strategy improvements outperform credit selection tightening dollar-for-dollar.
Technical Implementation
Model Architecture:
Scorecard: raw data → feature engineering (15+ derived features) → WOE binning via
scorecardpy → Logistic Regression → additive points table → credit score.
Challenger: same engineered features → sklearn Pipeline (OrdinalEncoder +
XGBClassifier with scale_pos_weight=3.52 for class imbalance) → probability output.
Feature selection removed collinear, legally sensitive, and negative-coefficient columns;
manual WOE bin breaks enforced monotonicity for the UTILIZATION feature.
Training Pipeline (mlops/):
Reproducible end-to-end pipeline: load → clean → engineer → select → WOE bin → two-pass LR
(first pass surfaces negative coefficients) → final LR + scorecard table → XGBoost with 5-fold CV.
Hyperparameters versioned in params.yaml; all runs logged to
MLflow including AUC, KS statistic, Gini coefficient, confusion matrix,
and ROC curve artifacts.
API Layer (FastAPI):
Six route groups: /predict, /predict/business, /ecl,
/simulate, /stress-test, /sensitivity.
Pydantic schemas enforce input validation; CORS middleware allows the standalone frontend
to call the API without a proxy.
Business input routes accept 10 plain-language fields and internally call
input_mapper.py to reconstruct all 21 raw dataset columns before inference.
Risk Analytics Services:
ECL service — computes PD × LGD × EAD per borrower with optional
segment-level breakdown; returns individual and portfolio totals.
Monte Carlo service — vectorised NumPy simulation producing Expected Loss,
Unexpected Loss, VaR, CVaR, min/max, and a 200-point loss distribution sample for charting.
Stress testing service — applies PD multipliers and LGD overrides from
risk_config.py; optionally overlays Monte Carlo on each scenario.
Sensitivity service — sweeps relative PD shifts and absolute LGD shifts,
returning ECL change percentage and elasticity at each point.
Frontend Dashboard (HTML/CSS/JS):
Six pages: Dashboard, Prediction, Risk Analytics, Simulation, Stress Test, Sensitivity —
all sharing a unified dark-themed design system.
Stress test results rendered via Chart.js bar chart; simulation results
display VaR / CVaR metrics in a responsive grid layout.
Video Preview
Key Learnings
Scorecard development requires two LR passes — the first surfaces negative
coefficients that violate monotonicity; removing them before the second pass is standard
industry practice, not optional cleanup.
WOE binning is sensitive to auto-generated boundaries; manual breaks are
sometimes necessary to enforce the risk ordering regulators expect.
Separating the business input layer from raw model features is architecturally
essential — it decouples frontend UX from model internals and makes the API safe for
non-technical integrations.
Monte Carlo simulation reveals tail risk that deterministic ECL masks completely —
two portfolios with identical ECL can have very different VaR profiles depending on
PD distribution shape.
MLflow experiment tracking becomes indispensable the moment you run more than
a handful of training experiments; reproducing a specific run without it is extremely difficult.
Regulatory frameworks (IFRS 9, Basel III) are not abstract — building to their
requirements from the start (ECL methodology, stress scenario definitions, model documentation)
is far cheaper than retrofitting compliance later.
Future Work
Add a model card with fairness metrics and feature importance for regulatory documentation.
Integrate a real-time data pipeline (Kafka or Airflow) so the platform ingests live transaction data rather than batch uploads.
Implement model drift detection — PSI (Population Stability Index) monitoring on score distributions over time.